Custom Bluesky Feeds
Table of Contents
Bluesky is the new hot social media site and this time things actually look interesting. It’s run by a public benefit company and one of the key features is algorithmic choice. The biggest way Bluesky provides algorithmic choice is by letting anybody create custom feeds. Not only can you choose which feeds appear in your app, even supplanting the default feeds if you want, you can create entirely custom feeds yourself.
This post will have two main sections. The first is for people interested in using the feeds I made. The second section is mainly for other developers interested in creating their own custom Bluesky feeds or working with Bluesky’s AT Protocol in general – or non-devs that are simply interested in the more technical aspects of how the feeds work.
Using the Feeds #
If you’re interested in using the feeds, I’ll walk you through how to add them and also what terms to use if you want your posts to show up in them. I’ll also describe what sort of things help determine the order posts show up in within the feeds.
How to get the feeds in your Bluesky app #
There are a few ways to find the feeds. First, here are the direct links to each feed:
From the top right, you’ll be able to pin to home. This will add it to the top of Bluesky so you can tab over and view each feed.
Once you have them pinned, you can also change the order that your feeds appear in at the top of the app. To do this, go to Feeds on the side menu and then tap the Gear icon at the top right – from there, you can tap the up and down arrows to change the order to your liking.
You can also find them on my profile under the Feeds tab. And finally, you can search for them using Bluesky’s search and then tap the Feeds tab in the search results to find them there.
Why did I make these feeds? #
Most custom feeds are simply filtering posts by keywords and presenting them chronologically, but what I personally liked most about the FGC on Twitter was when they’d share short clips showing tips and tricks or exceptional matches. So the goal of these feeds is to surface the most helpful content that you can use to improve at the games. That’s also why they target specific games and not their entire series. My hope is that people will subscribe to these feeds to find this sort of content, like/repost those posts, and incentivize more people to post helpful content for these games. The more content being shared, the better these feeds will be!
Stuff like fanart is not what these feeds try to capture – you’ll likely see some of it on the feeds, but it will generally be farther down. They also try to filter out NSFW content. If that’s what you’re after, I encourage you to create your own feeds for that purpose.
How are posts picked up by the feeds? #
Each game’s feed has a list of terms they look for. This is usually just variations of the game’s name. If the game has official accounts or dedicated fansite accounts on Bluesky (I refer to them as VIP accounts), I also include those posts even if the posts do not have any of the matching terms (feel free to let me know if I’m missing any).
I try to also include the Japanese names of each game. Because these feeds prioritize video content, the language barrier is less of an issue (and I expect Bluesky’s translation tools will only get better). If there are global communities talking about these games and their posts aren’t getting picked up by my feeds for some reason, let me know. And if you do want to limit which languages you see in the feeds, you can set that for yourself in Settings > Languages > Content Languages
.
If you’re interested, farther down in the dev section of this post I go into more detail about how I filter and capture posts relevant to each feed. I continue to work on minimizing false positives, but nothing’s perfect.
Terms being matched #
The feeds search both the post body and any alt text in images/videos. As of the time of this writing, these are the terms watched for each game (not case sensitive):
Street Fighter 6 #
- sf6
- street fighter 6
- StreetFighter6
- #SF6_ (this catches character hashtags like #SF6_RYU, #SF6_CHUN etc)
- スト6
- ストリートファイター6
Virtua Fighter #
- vf5
- vf6 (will be updated when the game has an official title)
- virtua fighter
- virtuafighter
- vfrevo
- vfes
- バーチャファイター
Fatal Fury: City of the Wolves #
- cotw
- cityofthewolves
- city of the wolves
2XKO #
- 2XKO (thank you for having a short and unique title, Riot)
What determines the order the posts are shown in? #
Each post is scored on content, engagement, and age. Each game feed is tuned to score things a bit differently – that’s mainly because, of the current game feeds, Street Fighter 6 is the only one that is already released (though people do share clips from older Virtua Fighter games). The unreleased games don’t have many people posting about them yet. After 2XKO, CotW, and VF5 REVO release (or during open betas), I’ll have the scoring parameters closer to SF6’s feed. As a result, the unreleased games are currently closer to chronological view with only small boosts for video and being posted by VIP accounts. I’m excited for these feeds to mature as their games come out – have already learned lots of cool tech from the SF6 feed.
But even after all the games are out, I’ll continuously be tweaking these parameters to try and improve the feeds as Bluesky continues to grow. Because the parameters are in flux, I’ll just use color circles to give you a rough idea of how things are scored.
Content #
- 🟢🟢🟢🟢 big boost for having native video
- 🟢🟢🟢 boost for being posted by VIP accounts
- 🟢🟢 boost for having a link
- 🟢 small boost for including alt text for any images/videos
- ⚪️ baseline post with nothing but text
- 🔴 small penalty for being a reply to another post
- 🔴🔴 has an image
- 🔴🔴🔴🔴🔴 heavy penalty for NSFW
Currently all links are treated the same, but I’ve played around with boosting youtube/twitch links over other links and may revisit that in the future.
Content penalties can be overcome by being boosted by other scores. For example, if there’s a text post and you reply with a video post, your overall content score will be higher. And even things heavily penalized on content can be brought back up if they get enough engagement score.
Engagement #
What you’d expect
- 🟢🟢🟢🟢 reposts
- 🟢🟢🟢 quoted
- 🟢🟢 likes
- 🟢 replies
Age #
Posts get a penalty as they age so that newer posts can take the spotlight. Pretty simple.
Right now engagement levels for posts, on average, are pretty low. As Bluesky grows, I could see this changing. If the feeds are ever dominated by high engagement posts, it might be worth trying to capture older posts that only recently went viral.
I’m learning as I go with the balance of the different scores and continue to tweak things.
Will I make a feed for game X or Y? #
Maybe! I’ll probably be adding new feeds in the future for my personal interests. If there’s enough demand for other games or topics, I’ll consider it – so let me know if there’s something you want. The hosting requirements for these feeds aren’t nothing, so I’m unlikely to want to set up and maintain a bunch of feeds for things that don’t interest me.
One consideration is how easy it is to filter for the content. For example, I played with making a feed for Valve’s new game Deadlock that I’ve been playing. Unfortunately the game’s title is also just a fairly common word, so there’s not an easy way to isolate posts about the game (unless Bluesky users are organized about using hashtags). I discuss these challenges more in the dev section later in this post.
Developing Bluesky Feeds #
This section is mostly intended for other developers interested in making a custom feed or working with Bluesky’s AT Protocol in general, but if you’re not a developer you may still find this stuff interesting! Walks through some of the big challenges and takeaways I encountered making these.
Shorten your Display Names! #
Early on I thought it was a bit weird for feed Display Names to be limited to 24 characters. It’s really not much room at all! But it really does make a lot of sense and I actually ended up shortening my feed Display Names even more than necessary. Hopefully this image demonstrates why:
Short names are a huge boost to usability! The shorter the names, the less the user has to scroll to switch between feed views. I recommend every custom feed try to minimize how much space they take up on people’s valuable top bar real estate. The biggest downside I can think of is I think they show a bit worse in search results, but people only see that once – they have to deal with your superfluous Display Names every time they open Bluesky, so I think short names easily win out. Include the full name in your description and your feeds will still show up when people search for the full name.
Finding matching posts and filtering stuff you don’t want #
Careful what terms you match with #
Try your best to find terms that are exclusive to your topic. I started off also matching with some character names that I thought were only used when talking about the game, but I would quickly learn that was not the case and find irrelevant posts in my feeds. When I look at other feeds for research, I notice a lot of false positives for similar reasons.
To some extent it’s a tradeoff you can use your own judgement on. Maybe you’re ok with more unrelated posts slipping into your feed. But ideally, I hope that custom feeds become so ubiquitous that users are more intentful about authoring their posts to match the intended feeds (at least when the feeds are for specific topics).
You could also provide a way for you to remove unwanted posts from the feed. Maybe it’s just another endpoint on your server that you send post URLs to. Or if you want to get fancy you could set up a labeling service on Bluesky, then you (and others) could report the unwanted post to that labeler through the Bluesky interface and have that trigger removal… but either way that would require more time from you (either removing them yourself or making sure the posts others are reporting actually should be removed so it’s not abused).
You might also think to use LLMs or some other ML to determine relevance to your feed, but considering the performance requirements, that seems like it could be expensive to implement.
A list of unwanted terms can be helpful #
After matching against terms, I also check to see if it matches against any unwanted terms so I can throw those posts out. This helps with a few scenarios.
- the term you want to match against may be a part of a bigger word – you can match against the smaller term and then unmatch if it’s within the encompassing term (there are other ways to accomplish this, but this is a quick way).
- Example: capture “SF6” (Street Fighter 6) but not “SF64” (Star Fox 64), while also still being able to capture something like #SF6_RYU (a valid, character-specific hashtag we do want to capture).
- a somewhat ambiguous term can effectively be made more specific
- for example, Fatal Fury: City of the Wolves uses “CotW” as shorthand. But “CotW” is also used for “Call of the Wild” and various “C__ of the week” phrases and hashtags. So you can look for these other terms and phrases to throw out posts and try and narrow down to what you’re actually trying to capture.
- sometimes people are posting things with random strings that might include your wanted terms. Promo codes, coupon codes, crypto wallets, etc. These posts often include terms you can use to filter out, but try to take care to ensure you’re not filtering out things related to your feeds.
Quotes and replies provide useful context #
In most cases, we can assume posts that reply to a relevant post are also relevant to the topic of the feed. Same goes for posts that quote a relevant post. This is an easy way to capture more posts that would otherwise be missed if they don’t use the watched terms. The Jetstream provides all the info you need to implement this. For replies, it includes the reply root cid. For quoted posts, you can get the quoted post’s cid from the record embed.
Terms within URLs or usernames #
Some unrelated URLs would by chance include the short terms I was trying to match with. Same with certain usernames. If someone’s username happens to include a term you’re tracking, you don’t want every reply to their posts to show up on your feed.
I considered using regex to ignore terms within URLs (or check for surrounding whitespace etc), but in an effort to keep things simple (and also for performance considerations), I opted instead to just utilize the Facets provided by Jetstream.
For both URLs and user tags, the Facet provides the start and end index for where that text is located within the body. So I just strip URLs and user tags from the text body before matching terms against it.
It’s not a perfect solution, but it works most of the time and I haven’t seen anything slip by yet. One reason it’s not foolproof: the nature of AT Protocol (or any open protocol like this). Nothing is stopping someone from pushing something to the firehose that has malformed facet data that gets the indexes wrong. So you need to handle this in terms of exception handling (like if they say the facet begins outside the range of the text or something), but it’s not something you can make 100% dependable.
Filtering out NSFW Content #
The Jetstream does provide labels, but only if those labels were part of the post when the post was created. That means you can easily filter out posts that the author marked as NSFW, but filtering out posts that the moderation team later labels NSFW requires more work.
Specifically, you have to subscribe to the label and watch to see if any new labels match a post you’re tracking. Yet another feed you need to subscribe to (see Performance section below).
Sometimes authors won’t label their post as NSFW but will use terms that indicate the NSFW nature of the post, so you can also do that to help catch more.
If a post is replying to or quoting a NSFW post, I also mark that post as NSFW even if it itself doesn’t have NSFW content.
On this topic, a lot of feeds can probably not bother filtering out NSFW content at all and just leave it up to the client to set the options they want. But I think for certain feeds, including mine, it makes sense – the kind of stuff some people post for some fandoms is, uh, extreme.
Performance #
Creating a custom feed for Bluesky, especially one that tracks engagement (likes/reposts/replies etc), is not trivial in terms of performance considerations. I’ll touch on a couple of the bigger lessons learned.
I do hope that as Bluesky continues to grow, the cost of running custom feeds doesn’t get too onerous. I’m not sure what it would look like, but it would be nice if it was feasible to have a custom feed without having to subscribe to all these feeds where the vast majority of data coming through is not related.
Data Sources #
I started from MarshalX’s python Feed Generator example, which uses the Firehose. It’s a realtime feed of almost everything that happens on Bluesky (and even some activity outside Bluesky that lives on the AT Protocol, if I understand correctly). That’s a lot of data and it was straining my cheap webhost so I looked for alternatives.
First I played with on demand solutions. There were a couple neat tools that let you use duckdb queries, but they were slow when I tested and I wasn’t sure if I could rely on them long term.
I finally migrated to using the Bluesky Jetstream and saw a huge reduction in resources used.
For chronological feeds, this is what you can expect. The big difference is that you can selectively subscribe to specific feeds – in this case, I hadn’t implemented any engagement scoring and I was only subscribing to app.bsky.feed.post
. The load mostly caught back up to Firehose once I implemented engagement scoring and had to also subscribe to app.bsky.feed.like
and app.bsky.feed.repost
(as you might expect, people like posts more than they post/repost posts).
Alternatively, you could instead occasionally use the getPosts endpoint to hydrate the posts and get the likes/reposts etc – but the big downside here is the 25 post limit for the endpoint. I was already tracking hundreds of posts in my feeds and that number will only grow as Bluesky grows, so I settled on subscribing to the like/repost feeds as the likely most sustainable solution.
Filter posts as quickly as possible #
The Jetstream servers will cut your connection if you are processing entries too slowly and fall behind. This is especially important if you’re using an older cursor to get posts from the past to backfill your feed (which I sometimes do if I make big changes).
The vast majority of posts coming in will probably be unrelated to your feeds, so optimize your code to throw unrelated posts out asap. Py-spy (for python) is super helpful for seeing what parts of your code are taking the most time. If you’re also using the example python feed generator project as a base, definitely be careful with how you’re using Peewee objects (the ORM used in the example) as that was a major bottleneck I had to fix. I’ll likely replace Peewee with SQLAlchemy if I continue to develop these feeds further (but I’ll need to profile SQLAlchemy too).
My feeds have a multi pass process for matching posts. Quicker methods do a first pass to see if they could match any feed at all – only then do slower methods do further filtering for individual feeds and more complex logic. Processing a single post that does match a feed takes a bit longer than it could, but processing posts that do not match any feeds take very little time – and the vast majority of posts will fall into the latter category.
Pagination cursor when feed isn’t chronological #
Handling the cursor for pagination is pretty straightforward when feeds are chronological, but become trickier when the order of the posts is algorithmically driven.
Take this simplified example:
- User loads feed, Post_ABC shows up on page 1
- User puts their phone down for a couple minutes, then picks it back up
- User scrolls down and the app wants to load more posts from the feed, but by this time, Post_ABC has been pushed down by newer posts and now it’s served again on page 2.
You don’t want to show Post_ABC twice (though I think Bluesky app will just hide it client side), but more generally, you want the feed to appear congruent as they scroll through it and load more posts even though the order of the posts is constantly changing in the feed.
Following advice from someone on the Bluesky API Touchers Discord channel, I store static caches of the feed based on when they were originally queried. Basically works like this:
- for a new, fresh query, I take a snapshot of the feed and cache it. I use the current timestamp as the lookup key.
- the timestamp’s resolution is reduced (Bluesky uses microseconds) so that if multiple people query the feed at near the same time, they can share the same cache
- when a client asks for more posts farther down the queue, it sends both that original timestamp (so it knows what cache to use) and an offset (so it knows how far down the feed to fetch posts from)
This way, when they load the feed, they can then scroll and scroll and scroll to look farther down the feed (in this case, to see older and lower scoring posts). Refreshing will load a fresh, updated feed starting at the top.
Hope all this is helpful!